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Abstract 

By any account, the 1998 proof of the Kepler conjecture is complex. The thesis underlying this article is 
that the proof is complex because it is highly under-automated. Throughout that proof, manual procedures 
are used where automated ones would have been better suited. This paper gives a series of nonlinear 
optimization algorithms and shows how a systematic application of these algorithms would bring substantial 
simplifications to the original proof. 



1 Introduction 

In 1998 a proof of the Kepler conjecture was completed g. By any account, that solution is complex 
(300 pages of text, 3GB stored data on the computer, computer calculations taking months, 40K lines of 
computer code, and so forth). The thesis underlying this article is that the 1998 proof is complex because 
it is highly under-automated. Throughout that proof, manual procedures are used where automated ones 
would have been better suited. [] 

Ultimately, a properly automated proof of the Kepler conjecture might be short and elegant. The hope 
is that the Kepler conjecture might eventually become an instance of a general family of optimization 
problems for which general optimization techniques exist. Just as today linear programming problems of 
a moderate size can be solved without fanfare, we might hope that problems of a moderate size in this 
family might be routinely solved by general algorithms. The proof of the Kepler conjecture would then 
consist of demonstrating that the Kepler conjecture can be structured as a problem in this family, and 
then invoking the general algorithm to solve the problem. 

As a step toward that objective, this article frames the primary algorithms of that proof in sufficient 
generality that they may be applied to much larger families of problems. The algorithms are arranged 
into four sections: Quantifier Elimination, Linear Assembly Problems, Automated Inequality Proving, 
Plane Graph Generation. 

We do not claim any originality in the algorithms. In fact, the purpose is just the contrary: to 
exhibit the proof of the Kepler conjecture insofar as possible as an instance of standard optimization 
techniques. To keep things as general as possible, the algorithms we present here will make no mention 
of the particulars of the Kepler conjecture. A final section lists parts of the 1998 proof that can be 
structured according to these general algorithms. 



2 Quantifier Elimination 

We might try to structure the Kepler conjecture as a statement in the elementary theory of the real 
numbers. Tarski proved the decidability of this theory, through quantifier elimination. G. Collins and 
others have developed and implemented concrete algorithms to perform quantifier elimination ||. The 
Kepler conjecture, as formulated in ||, is not a statement in this theory, because the transcendental 
arctangent function enters into the statement. 
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However, it seems that the arctangent is not essential to the formulation of the Kepler conjecture, 
and that it enters only because no attempt was made to do without it. For example, it is plausible that 
it can be replaced with a close rational approximation, without doing violence to the proof. In fact, 
the computer calculations in that proof are already based on rational approximations with explicit error 
bounds, and on its rational derivative 1/(1 + x 2 ). 

Assuming this can be done, quantifier elimination gives a procedure to solve the Kepler conjecture. 
Unfortunately, these algorithms are prohibitively slow (exponential, or doubly exponential in the number 
of variables). 

Section [| of this article proposes a different family of optimization problems for which algorithmic 
performance is satisfactory. These are called linear assembly problems. 

Although quantifier elimination is too slow to be of practical value as a 1-step solution to the Kepler 
conjecture, it can be of great value in proving intermediate results. Recent algorithms are able to solve 
problems nearly at the level of difficulty of intermediate results in the Kepler conjecture ||, Jl3| ] . 

Instances of the following families of problems arise as intermediate steps in the Kepler conjecture. 
Each instance of the following families must provide an explicit set of parameters (or dmin, dmax) , 
and the problem becomes to show that configurations of points in R 3 with the given parameters do not 
exist. In theory, the problems are all amenable to solution by quantifier elimination. 

Problem 2.1. Let S be a simplex whose edges all have length at most given parameter values r^. Show 
that there is no point in the interior of the simplex with distance at least r from every vertex. 

Problem 2.2. Show that there does not exist a triangle of circumradius at most r\, and a segment of 
length at most r2 such that the segment passes through the interior of the triangle and such that each 
endpoint of the segment has distance at least r^, from each vertex of the triangle. 

Problem 2.3. Show that there does not exist a configuration of 5 points 0,pi,j?2,P3, <7 with given mini- 
mum dmin(p, q) and maximum dmax(p, q) distances between each pair (p, q) of points. The line through 
(0,5) must link the triangle with vertices pi. 

Problem 2.4. Show that there does not exist a configuration of 6 points 0, p\, . . . ,p4, q, with given 
minimum and maximum distances between each pair (p, q) of points. The line (0, q) must link the skew 
quadrilateral with vertices qi (ordered according to subscripts). 

Problem 2.5. Show that there does not exist a configuration of 7 points 0, pi, . . . ,p±, q\, q 2 with given 
minimum dmin(p, q) and maximum dmax(j), q) distances between each pair (p, q) of points. For j = 1, 2 . 
the line (0, qA must link the skew quadrilateral with vertices (ordered according to subscripts). 

Although we hope that one day these problems will all be amenable to direct solution by quantifier 
elimination, in practice, we did not try to apply quantifier elimination directly without preprocessing 
them. The idea of preprocessing is that if a configuration exists, then the points can be moved in such 
a way to make various upper and lower bound constraints on distances bind. With a large number of 
binding constraints, the dimension of the configuration space becomes smaller and the problem easier 
to solve. In some cases, preprocessing reduces the configuration space to a single configuration, so that 
the existence of the configuration can be tested by choosing coordinates and calculating whether all the 
metric constraints are satisfied. 

These five families of quantifier elimination problems have a similar feel to them. Let us give a 
preprocessing algorithm in general enough terms that it applies uniformly to all five problem families. 

The primary preprocessing of the configurations is a deformation that we call pivoting. Fix any three 
of the points pi, p 2 , and q of the configuration. We call a pivot through axis (pi,P2) the continuous 
motion of q in the perpendicular bisecting plane B of (pi,P2) at constant distance from pi and p 2 - Thus, 
the pivot moves q in a circular path in the plane B. 
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Figure 1: A pivot is the circular motion of a point around a fixed axis. 



Usually, the plane P = (pi,P2><z) through the three points is chosen to have the property that the 
entire configuration lies in a half-space through P. If q moves away from the half-space containing P, 
the distances from q to the other points of the configuration increase or remain the same. If q moves into 
the half-space, the distances decrease or remain the same. 

More generally, we allow the plane P to separate the points of the configuration into two groups, such 
that the lower distance bounds from q to the first group do not bind, and such that the upper distance 
bounds from q to the second group do not bind. 

To apply pivots, we must prescribe their directions, whether into the half-space or away from it. To 
do this, we give a model, which is a configuration that exists, of the form indicated in the problem family, 
but which is not required to satisfy the various constraints. Various edges in the model are marked with 
a strut (indicating a lower bound) or a cable (indicating an upper bound). If the model has a cable, then 
preprocessing pivots are applied to increase the corresponding distance in the configuration space, until 
the given upper bound is reached. Where the model has a strut, pivots are applied so as to decrease 
distances to the lower bound. 

(Bob Connelly has pointed out that some of these problems can be viewed as tensegrity problems, but 
we do not see how to treat them all as tensegrities, so we do not pursue this point of view here. Globally 
rigid tensegrities are analogous to our models. However, our models are not claimed to be rigid.) 



Example 2.6. In Problem 2.1, let the upper bounds on the edges of the simplex be y/8, and let r = 2. 
We take the model to be a regular tetrahedron with edges marked as cables. Mark the edges from the 
circumcenter to the vertices as struts. We apply pivots to the simplex to increase its edges to y8. Move 
the interior point by a sequence of pivots so that it has distance exactly 2 to three of the four vertices of 
the simplex. 

After these pivots are completed, the configuration is uniquely determined, and a calculation with 
explicit coordinates shows that the configuration does not exist, because the distance from the interior 
point to the fourth vertex of the simplex is too small. 



In these problems, when we pivot in the correct direction, the distance constraints between points 



take care of themselves. However, some of the problems impose additional constraints. In Problem 2.1 
the point is constrained to lie in the simplex. In the other problems, lines are required to be linked 
with various space polygons. A separate verification is required to see that pivots do not violate these 
additional constraints. These separate verifications are again quantifier elimination problems, of a smaller 
magnitude than the original problem. 



For example, in Problem 2.1, we verify that the point cannot be an interior point of a face of the 



simplex. This insures that the point does not escape from the interior of the simplex during the sequence 
of pivots. The argument that there does not exist a point in a face, under the given metric constraints, 
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is similar to Example 2.6, but all the arguments are reduced to two dimensions, instead of three. 

The preprocessing in most other cases is similar to Ex amp le 2.6 , and can be reconstructed without 
difficulty from the models. The two exceptions are Problem 2A and Problem 2^3, which require substantial 
preprocessing and a lemma to insure that the pivots can be carried out. (I would much prefer to have 
arguments based on a pure quantifier elimination algorithm and bypass this lemma entirely, but the 
current quantifier elimination algorithms do not seem up to the task.) 

Lemma 2.7. Fix constants ki, e, hi, and £ subject to the constraints 

£i<V8, fee [2,2.61], e>2, /i;e[2,\/8], £e[2,V8]. (1) 

Pick the following parameters in Example \2.{\ 

dmin(p, qj) = hi dmin(<7i, (72) = e dmin( others) = 2 
dmax(0,pi) — £{ dmax(0, qj) — I dm&x(pi,pi + i) = ki 

and let the other values o/dmax be +00. If a configuration of 7 points exists with these parameters, then 



Thomas C. Hales 



5 



a configuration also exists with these parameters and the additional constraints that 

N = 2, \pi - Pi+i\ = hi, \qj\=l. 
Furthermore, the same lemma and conclusion holds in the context of the 6-point configuration of Exam- 



ple 2.4, if we take q\ = q% = q and dmin(<7i, (72) = 0. 



Proof. This is Lemma 4.3 of |Q. Some of the constants have been relaxed in a way that affects the proof 
in a very minor way. (Two modifications must be made to the proof. The assertion that the circumradius 
of a triangle of sides 2.1,2.51,2.51 is less than \[2 of the original must be replaced with the assertion 
that there exists an x > 2 such that the circumradius of the triangle of sides x, 2.61, 2.61 is less than \/2. 
Also, an instance of Problem is needed, with r = 2 and the length-bounds of the sides of the simplex 
2.61, 2.61, V8 at one vertex, and lengths at most V8 at the edges opposite the edges of length at most 
2.61.) □ 



3 Linear Assembly Problems 

In this section we define a class of nonlinear optimization problems that we call linear assembly problems. 

Assume given a topological space X, and a finite collection of topological spaces, called local domains. 
For each local domain D there is a map 7Td : X — > D. There are functions Uj, i = 1, . . . , N, each defined 
on some local domain Di = dom(iii), and we let Xi denote the composite xi = 7Td 4 u %- 

On each local domain D, the functions Ui are related by a finite set of nonlinear relations 

<j>(ui : dom(uj) = D) > 0, <j> € <&d- (2) 

We use vector notation x — (x\, . . . , xjv), with constant vectors c, 6, and matrix A given. 
The problem is to maximize c • x subject to the constraints 

Ax<b, (3) 

and to the nonlinear relations |^. A problem of this form is called a linear assembly problem. (Intuitively, 
there are a number of nonlinear objects Z?, that form the pieces of a jigsaw puzzle that fit together 
according to the linear conditions |[) 

Example 3.1. Assume a single local domain D, and let ttd '■ X = D be the identity map. The function 
f = c • x is nonlinear. The problem is to maximize f over D subject to the nonlinear relations $ d ■ This 
is a general constrained nonlinear optimization problem. 

Example 3.2. Assume that each Ui has a distinct local domain Di = R. Let X = R w , let ttd be the 
projection onto the ith coordinate, and let Xi be the ith coordinate function on R". Assume that is 
empty for each D. The problem becomes the general linear programming problem 

max c • x 

such that Ax < b. 

These two examples give the nonlinear and linear extremes in linear assembly problems. The more 
interesting cases are the mixed cases which combine nonlinear and linear programming. Example [2.3| 
gives one such case. 

Example 3.3. (2D Voronoi cell minimization). Take a packing of disks of radius 1 in the plane. Let 
A be the set of centers of the disks. Assume that the origin £ A is one of the centers. The truncated 
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Figure 3: A truncated Voronoi cell and a subset of the cell lying in a sector 



Voronoi cell at is the set of all x e R 2 such that \x\ < t, and x is closer to the origin than to any other 
center in A. We assume t G (1, y/2). 

Only the centers of distance at most 2t affect the shape and area of the truncated Voronoi cell. For 
each n = 0, 1, 2, . . . , we have a topological space of all truncated Voronoi cells with n nonzero disk centers 
Vi at distance at most 2t. Fix n, and let X be the topological space. 

Let D = Di, i = 1, . . . ,n, be the sectors lying between consecutive segments (0,Vi). Each sector is 
characterized by its angle a and the lengths y a and yi> of the two segments (0,Vi), (0,Vj) between which 
the sector lies. The part A in D of the area of the truncated Voronoi cell is a function of the variables a, 
y a , yb- A nonlinear implicit equation </> = relates A, a, y a , and y h on D. The variables Ui of the linear 
assembly problem for the local domain D are A, y a , y b , a. 

We have a linear assembly problem. The function c-x is the area of the truncated Voronoi cell, viewed 
as a sum of variables A, for each sector D (or rather, their pullbacks to X under the natural projections 
X -> D). 

The assembly constraints are all linear. One linear relation imposes that the angles of the n different 
sectors must sum to 2ir. Other linear relations impose that the variable y a on D equals the variable yb 
on D' if the two variables represent the length of the same segment (0,Vi) in X. 



3.1 Solving linear assembly problems 

In this section we describe how various linear assembly problems are solved in the proof of the Kepler 
conjecture in terms sufficiently general to apply to other linear assembly problems as well. 

Let us introduce some general notation. Let xd = (xi : dom(ui) = D) be the vector of variables with 
local domain D. Write c • x in the form J2d c d ' x d and the assembly conditions as 

Ax = A D x D , 

D 

according to the local domain of the variable. 
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3.1.1 Linear relaxation 

The first general technique is linear relaxation. We replace the nonlinear relations 4>(xd) > 0, 4> € $_d with 
a collection of linear inequalities that are true whenever the constraints $£> are satisfied: A' D X]j < bp- 
A linear program is obtained by replacing the nonlinear constraints $£> with the linear constraints. Its 
solution dominates the nonlinear optimization problem. In this way, the nonlinear maximization problem 
can be bounded from above. 

Let us review some constructions that insure rigor in linear programming solutions. We assume 
general familiarity with the basic theory and terminology of linear programming. It is well-known that 
the primal has a feasible solution iff the dual is bounded. We will formulate our linear programs in such 
a way that both the primal and the dual problems are feasible and bounded. 

We use vector notation to formulate a primal problem as 

max c • x (4) 

such that Ax < b, where x is a column vector of free variables (no positivity constraints), A is a matrix, 
c is a row vector, and b is a column vector. 

We can insure that this primal problem is bounded by bounding each of the variables Xi. (This is easily 
achieved considering the geometric origins of our problem, which provides interpretations of variables as 
particular dihedral angles, edge lengths, and volumes.) We assume that these bounds form part of the 
constraints Ax < b. 

The linear programs we consider have the property that if the maximum is less than a constant K, 
the solution does not interest us. (For instance, in the dodecahedral conjecture, Voronoi cell volumes 
are of interest only if the volume is less than the volume of the regular dodecahedron.) This observation 
allows us to replace the primal problem with one having an additional variable t: 

max c • x + K t (5) 

such that Ax + bt < b, and < t < 1. This modified primal is bounded for the same reasons that the 
original primal is. It has the feasible solution x = and t = 1. 

Lemma 3.4. // the maximum M of the original primal is greater than K, then the optimal solution of 
the modified primal has t = 0, and hence its maximum is also M . 

Proof. Assume that (xo,io) gives an optimal solution to the modified problem for some 1 > to > 0, with 
c ■ x + Kt a > M. Then (xi,t\) — {xn/{l — io),0) is also a feasible solution and it beats the optimal 
solution 

c • x\ + Kti > c ■ x n + Kto- (6) 
This contradiction proves to = 0. □ 

The output from linear program that is solved by numerical methods can be transformed into a 
rigorous bound as follows. Based on the preceding remarks, we assume that these linear programs are 
feasible and bounded. The dual is then also feasible and bounded. We assume that the numerical 
solutions are carried out with sufficient accuracy to insure bounded feasible approximations to the true 
optima. 

To explain the rigorous verification, we separate the equality constraints from the inequality con- 
straints, and rewrite the problem as 

max c • x (7) 
such that A'x = b' , Ax < 6, with x free. The dual problem yields a solution to 



min yb' + zb, 



(8) 
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such that yA' + zA = c, with z > and y free. Let (yo,zo) be a numerically obtained approximation 
to the dual solution. The vector zq will be approximately positive, and by replacing negative coefficients 
by 0, we may assume zq > 0. Let <5 = c — yA' — zA be the error row vector resulting from numerical 
approximations. Then for any feasible solution x of the primal, we have 

c • x = (8 + y Q A' + z A)x < 5 ■ x + y ■ b' + z ■ b. (9) 

Using the bounds of the variables Xi, we bound 5 ■ x < D, and thus obtain the rigorous upper bound 
c • x < D + yo ' b' + zq ■ b on the primal. 

3.1.2 Implementation details 

The linear programs are solved numerically using a commercial package (CPLEX). The input and output 
to these numerical programs are processed by a custom java program, which is linked to CPLEX with a 
java API provided by the software manufacturer. Each bound is calculated with interval arithmetic to 
insure that it is reliable. (We use a simple implementation of interval arithmetic in java based on the 
math.BigDecimal implementation of arbitrary precision arithmetic.) 

3.2 Nonlinear duality 

The second general technique is nonlinear duality. Suppose that we wish to show that the maximum of 
the primal problem |i] is at most M. 

Let x* = (x* D ) be a guess of the solution to the problem, obtained for example, by numerical nonlinear 
optimization. We relax the nonlinear optimization by dropping from the matrix A and the vector b those 
inequalities that are not binding at x* . With this modification, we may assume that Ax* = b. Let m be 
the size of the vector b, that is, the number of binding linear conditions. Let d be the number of local 
domains D. 

We introduce a linear dual problem with real variables t, : (f> G <£>£>, and w £ R m . The variables 
and w are constrained to be non-negative. 

We consider the linear problem of maximizing t such that 

M + dt-c-x* > (10) 
and such that for each xd in each D the linear inequality 

cd ■ (xd - x* D ) -I- r 4><fi( x ) + wA D (x* D - x D ) + t < (11) 

is satisfied. 

There is no guarantee that a feasible solution exists to this system of inequalities. However, any 
feasible solution gives an upper bound M. Indeed, let x = (xd) be any feasible argument to the primal, 
and let t, r^, w be a feasible solution to the dual. Taking the sum of the linear inequalities [0], over D at 
x, we have (recall 4> > and Ax < b): 

M >M + c-{x- x*) + J2d E$ d ^(x) + wA{x* -x)+ dt, 
>c-x+(M + dt-c-x*) + w(b - Ax), 

> c • X. 

Since the dual problem has infinitely many constraints (because of constraints for each i e D), we 
solve the dual problem in two stages. First, we approximate each D by a finite set of test points, and 
solve the finitely constrained linear programming problem for t, r<f,, and w. 

We replace t with to — {—M+c-x*)/d (to make the constraint |l(]bind). It follows from the feasibility 
of t that t > to, and that to, r<f,, w is also feasible on the finitely constrained problem. To show that ^^^,11; 
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satisfies all the inequalities [ll] (under the substitution t i— » to), we use interval arithmetic to show that 
each of these inequalities hold. (To make these interval arithmetic verifications as easy as possible, we 
have chosen the solution to,r,w to make the closest inequality hold by as large a margin t — to as possible. 
This is the meaning of the maximization over t in the dual problem.) The next section will give further 
details about interval arithmetic verifications. 

3.3 Branch and bound 

The third technique is branch and bound. When no feasible solution is found in step (2), it may still be 
possible to partition X into finitely many sets X — ]J Xi, on which feasible solutions to the dual may be 
found. Although this is an essential part of the solution, the rules for branching in the Kepler conjecture 
follow the structure of that problem, and we do not give a general branching algorithm. 

4 Automated Inequality Proving 

What we would like is a general, efficient algorithm for proving inequalities of several real variables. Each 
inequality / < of a continuous function on a compact domain can be expressed as a maximization 
problem: 

max/<0. (12) 

Generally efficient algorithms are not possible because NP hard problems can be encoded as optimization 
problems of quadratic functions |l(J . 

This section describes an inequality proving procedure that has worked well in practice, and which 
could be automated to provide a method of general interest. This section assumes some general familiarity 
with issues of floating-point and interval arithmetic, such as can be found in JlJ, ||. Our methods are 
similar to those in [12] . 

To prove / < 0, it is enough to show that the maximum of / is less than 0. For this reason, we use 
interval arithmetic to bound the maximum of functions. Through interval arithmetic, an interval [a, b] 
containing the range of / can be obtained. By verifying that b < 0, it follows that the range of / is 
negative, and hence that / < 0. 

All our functions can be built from arithmetic operations. (Transcendental functions are replaced 
with explicit rational approximations with known error bounds.) 

Often, the functions / are twice continuously differentiable. To obtain additional speed and accuracy, 
we use interval arithmetic to obtain rigorous bounds on the second partial derivatives of /. (We obtain 
formulas for the second partials through symbolic and automatic differentiation of the function /). With 
bounds on the second partials, we obtain rigorous bounds on / through its Taylor approximation. 

The accuracy of the Taylor approximation improves as the domain shrinks in size. We chop the domain 
into a collection of small rectangles and check on each rectangle whether the Taylor bound implies / < 0. 
If Taylor bound is too crude to give / < 0, we divide it into smaller rectangles and recompute the Taylor 
bounds. By a process of adaptive subdivision of rectangles, the inequality / < is eventually established. 

Derivative information can be used to speed up the algorithm. Taylor bounds can also be applied to 
the first partial derivatives of /. If a partial derivative of a variable x is of fixed sign on a rectangle, then 
the function is maximized along an edge x — a of the rectangle. If this edge is shared with an adjacent 
rectangle, the maximization of / is pushed to an adjacent rectangle. If this edge lies on the boundary of 
the domain, the dimension of the optimization problem is reduced by one. 

The method outline above works extremely well for simple functions in a small number of variables. 
The complexity grows rapidly with the number of variables. We are able to obtain satisfactory results 
for many inequalities that depend on a single simplex S, that is, functions of six variables parameterized 
by the edge lengths of a simplex. 
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4.1 Generative Programming 

Most of the computer code for the proof of the Kepler conjecture implements the Taylor approximations 
of the nonlinear functions. The computer code for proving / < is obtained as follows. 

First, an expression for / is derived. The formulas for the first and second partial derivatives of the 
function are obtained (say by a symbolic algebra system) from the expressions for /. 

These symbolic expressions are then converted to an interval arithmetic format. In a language such 
as CH — h with operator overloading, this can be achieved by defining a class for intervals and overloading 
arithmetic operations so that they may be applied to instances of the class. In languages without operator 
overloading, the conversion from the symbolic expression to computer source code is more involved. 

There are other considerations to bear in mind in producing the interval code. In practice, there is a 
substantial degradation of performance when the rounding mode on the computer is frequently switched, 
and often it is necessary to rearrange the code substantially to reduce the number of changes in rounding 
mode. Also, floating point arithmetic is not associative, so that in order to obtain rigorous results based 
on interval arithmetic, great care must be paid to the placement of parentheses. Another issue is the input 
of floating point constants. In C++, the line of code in |l^ sets x = 1.0, no matter the rounding flags. 
(The constant is parsed at compile time and truncated to 16 digits, and there is no control over rounding 
modes until later, when the program executes.) The code must insure that no errors are introduced 
through compiler constant truncation. 

x = 1.000000000000000000001; // set x=1.0, regardless of rounding flags (13) 

There are many such perils in the production of reliable interval arithmetic code. Overall, a great deal 
of effort must be expended to produce the computer code for rather simple inequalities. This effort must 
be expended every time a new function is introduced into an inequality. This simple fact has kept the 
inequality-proving software developed for the proof of the Kepler conjecture from having more widespread 
applicability to more general inequality proving. 

Figure || shows a snippet of C++ code that computes the arctangent of a linear germ of a function. 

If the Kepler conjecture is eventually to be proved by generic tools, we must find a less cumbersome 
way to produce the computer code. Indeed, a fundamental principle of software design is that there 
should be no manual procedures (Pragmatic Programmer, Tip 61) jnj. Generative programming gives 
methods to automate the production of computer code PJ . There is nothing about the interval arithmetic 
computer code for a new function that requires human thought or effort in an essential way. For example, 
an examination of the code for the arctangent in Figure f| reveals that it is a shallow reformatting of the 
formula for the derivative of the arctangent, combined with the quotient rule in calculus. Why should 
the code be produced by hand, if it the process is entirely mechanical? 

A generative program could be written that takes as its input a function and produces as output the 
interval arithmetic computer code for the Taylor series bounds of that function. The program would 
parse the definition of the function, generate symbolic derivatives of the function, convert the derivative 
information to computer code for calculating the derivatives, and so forth. 

What advantages would this bring? First of all, it would no longer be necessary to read 40K lines of 
check to check the correctness of the proof of the Kepler conjecture. It would be enough to check the code 
on the much smaller generator. Also, the same generator could be used to prove many other inequalities. 



4.1.1 Implementation details 

The generative program has not been written. Some feasibility tests have been made with javaCC for 
parsing and XSLT for abstract syntax tree transformations. 
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Figure 4: Code to calculate an interval version of the arctangent function 

/** 

* A linelnterval is an interval version of a linear approximation to a 

* function in 6 variables . 

* The linear approximation is +f + Df [0] xO + Df[l] xl + Df [2] x2 +...+ Df [5] x5 . 

*/ 

class linelnterval { 
public : 

interval f,Df[6]; 

// rest of class omitted 

>; 
/** 

* Sample implementation of the arctangent function. 

* This computes the linear approximation only. The second derivatives 

* are much more involved. 
*/ 

static linelnterval at an (linelnterval a, linelnterval b) // atan(a/b) ; 
{ 

static const interval one("l"); 
linelnterval temp; 

temp.f = interMath: :atan(a.f/b.f ) ; // computes interval-valued arctangent 
interval rden = one/(a.f *a.f+b.f *b.f ) ; 

for (int i=0;i<6;i++) temp.Df [i] = rden*(a.Df [i] *b. f-b.Df [i] *a.f ) ; 
return temp; 

} 



5 Plane Graph Generation 

A sphere graph is a graph together with an embedding of it into the unit sphere. We discuss a simple 
sphere graph generating algorithm in this section. 

Figure shows a sequence of sphere graphs, giving a sequence of faces that are added to a square to 
produce the graph dual to the edge graph of the rhombic dodecahedron. We can represent the sequence 
abstractly as a directed graph with vertices Vi, . . . ,uu with edges from Vi to Vi + \ . Figure ^| shows that the 
sequence of faces can be generated in different orders, and that all different sequences can be represented 
as a directed graph whose root is the square v\ . Call this the derivation graph. The terminal vertices of 
the directed graph represent sphere graphs isomorphic to v\\. 

In going from a parent to a child, we always add one face. Each sphere graph in the derivation 
graph has two types of faces - those such as the pentagon in V2 that does not survive unmodified in the 
terminal nodes, and those such as the triangle in v-i that do. Call these two types of faces modifiable and 
unmodifiable respectively. 

Let us generalize this construction to generate all the sphere graphs that are needed for the proof of 
the Kepler conjecture. Fix a natural number N . 

Consider the set Vo of nonempty sphere graphs with at most N vertices, and no loops or multiple 
joins. All faces of the sphere graph are assumed to be polygons, and all polygons are assumed to be 
simple. We give each face one of the two attributes modifiable or unmodifiable and call a graph with these 
attributes a decorated graph. Let V be the set of all decorated graphs of Vq. 

Let P be a polygon. We say that a simple polygon Q is an admissible refinement of P if every vertex 
of Q is either a vertex of P or an interior point of P and if Q shares at least one edge with P. An 
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Figure 5: The first few stages and the final few stages of drawing the graph dual to the rhombic dodeca- 
hedron 




Figure 6: The same graphs can be generated by different sequences of adding faces 

admissible refinement of P partitions the region P into the region Q and finitely many other polygons. 

Let v be a decorated sphere graph in V. We say that v' is an admissible refinement of v if there is a 
modifiable face P of v' and an admissible refinement Q of P such that the graph obtained by adding Q 
is v' that is compatibly decorated. We say that v' is compatibly decorated if the unmodifiable faces of 
v' are Q together with the unmodifiable faces of v. A special case occurs, when P = Q, and in this case, 
we simply change the attribute of P to unmodifiable. 

The set V becomes a directed graph T with an edge from each v to all of the admissible refinements 
of v. The root of the directed graph is an empty node. The children of the root are graphs consisting 
of a single polygon dividing the sphere into an interior and exterior, one side modifiable and the other 
not. The terminal vertices in this directed graph are decorated graphs, with no modifiable faces. Thus, 
terminal vertices are in natural bijection with Vo- 

If we take any graph in Vo, it can be reached from the root as follows. Pick a face of Vo and draw 
its edges as the initial polygon. Make the interior of the polygon modifiable. Then continue to pick 
faces that have at least one edge already drawn, and draw all remaining edges of that face, marking the 
completed face as unmodifiable. This corresponds to following a edge in the directed graph T. 



Thomas C. Hales 



13 



This gives us an algorithm to generate all sphere graphs in Vq: begin with the children of the root 
vertex (polygons with at most N vertices) and generate all admissible refinements (that is, follow all 
possible directed edges) until terminal vertices are reached. 

We can improve on this algorithm by fixing for each v € T a modifiable face P and an edge on that 
face, and then taking only admissible refinements Q of P that share the given edge. Each sphere graph in 
Vb is still generated under this restriction. We can also assume without loss of generality that the initial 
polygon is chosen to be one with the most edges. 

There is an enormous combinatorial explosion as all admissible refinements arc generated. As fortune 
has it, we are not interested in all sphere graphs Vb, but rather only those that arise as a potential 
counterexample to the Kepler conjecture. Let V\ C Vq denote this smaller set of relevant graphs. This 
allows us to combine the general graph-generating algorithm with pruning operations that keep the 
combinatorics from getting out of hand. 

What is needed are criteria onuST that allows us to conclude that v has no no admissibly refined 
descendents in Vi, that is, to conclude there is no directed path from v to V\ £ V\. The proof of the Kepler 
conjecture gives a long list of properties of graphs in V\ and this avoids the combinatorial explosion. The 
implementation of the graph generator includes many other minor tricks to keep the execution time 
manageable. Pruning and the other tricks are rather mundane, and we refer the reader to the source 
code for details. 



6 Conclusion 



We will not try to list all of the places where the algorithms of this article would bring a simplification 
of the 1998 proof of the Kepler conjecture. The list would be extensive. To give a rough indication, we 
list a few places these algorithms are relevant to the two shortest articles of the proof || and ||. 

In the article j(| alone, low-dimensional quantifier elimination problems are the subject of Lemma 
1.2, Lemma 1.3, Lemma 1.4, Lemma 1.5, Lemma 1.6, Lemma 1.7, Lemma 1.8, Lemma 1.9, Lemma 
1.11, Lemma 2.1, and Lemma 2.2. In JoJ , an additional low-dimensional quantifier elimination problem 
appears in Lemma 2.2. In general, the parts of the 1998 proof that rely on what that proof calls geometric 
considerations are amenable to preprocessed quantifier elimination. 

Linear assembly algorithms generalize the algorithm presented in Appendix 2. Some examples 
where linear assembly would simplify the 1998 proof are |6| Section 4, and || Proposition 4.1, Proposition 
4.2, Proposition 5.2, Proposition 5.3, Appendix 1 (A. 5 and A. 7). 

Interval arithmetic inequalities are used throughout the 1998 proof, in sections such as ^ Appendix 
3.13.1-4.7.5 and || Section 10, Appendix 1. In the early articles in the series, interval arithmetic Taylor 
approximations are not used 0. As a result, these early papers only prove very limited types of in- 
equalities. The entire strategy of the proof of the Kepler conjecture in is shaped by these algorithmic 
limitations. This profoundly affects the structure of the optimization problem in ||, because a scoring 
function within the reach of the early primitive algorithms was chosen, although such a function was 
highly suboptimal. With improved automated inequality proving algorithms, it should be possible to 
make a fresh start and devise a much more efficient scoring function. 

Graph generation is carried out in Q Section 8. 

We have not yet reached the fundamental objective of avoiding all manual procedures; some parts 
of the proof remain hand-made (even after taking account of the algorithms of this article). A fully 
automated proof would have to develop additional algorithms to prove these estimates. Nevertheless, 
this article brings us one step closer to that objective. 
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